rstudio::conf2018 was held on February 2nd and 3rd in beautiful sunny San Diego, California ☀️🏄🏼♂️. It was a high energy atmosphere with over 1200 local, regional, national and international users in attendance.
The conference was well organized and staffed. The talks were set up with tables, power cords, and three large monitors to provide a good view for everyone in attendance. The conference seemed on the verge of splurging in regard to location 📌, venue, food 🥝, lounges, professional headshots, t-shirts 👕, goodie bags 💰, and the social functions 🎂. My rstudio::conf experience was a far cry from what I’ve grown accustom to at academic based conferences where you wonder where your registration fee goes. Well done rstudio::conf!!
— @tanyacash21
— @ZazzValette
The content of this conference was focused on the suite of RStudio products and opportunities for extending and contributing to product development.
1. RStudio - an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management
2. RStudio Supported Packages - The RStudio team contributes code to many R packages and projects, including the tidyverse, shiny, rmarkdown.
3. For cost products such as RStudio server, RStudio server pro, and RStudio connect, that deliver teams productivity, security, centralized management, metrics, and commercial support that professional data science teams need to develop at scale.
Note: educational and non-profit rates are available.
The conference packed approximately 60 talks and two keynotes into 2 days with the option of attending a focused two day workshops prior to the conference. In the span of 2 days, I attended 2 keynotes, 2 fireside chats, and over 20 talks. Almost too much information!! It was difficult to choose among the 3 concurrent sessions but my mind was at ease knowing that all sessions were being recorded and would be available after the conference. Recorded session are appreciated! After taking a breather and digesting the sessions I attended, I’ll start to explore the treasure-trove of content that still remains.
Workshops were offered that catered to beginners, intermediate, and advanced users. The conference talks were primarily primers and demonstrations of tools, resources, and opportunities. For those interested in focused on instruction, the workshop materials would be your best bet.
Data Science in the Tidyverse, Charlotte Wickham, @cvwickham
Intro to Shiny & R Markdown, Mine Çetinkaya-Rundel, @minebocek
What They Forgot to Teach You About R (aka Getting S*%! Done in R), Jenny Bryan, @JennyBryan
Tidyverse Train-the-Trainer, Garrett Grolemund, @statgarrett
Recorded streams of the conference are available :
And two people have repos that have accumulated the slides and materials from most the talks, matthewravey and simecek.
As a brand new twitter user (@ryann_crowley), I was amazed at how quickly folks were posting about session content…many of the tweets were posted WHILE IN THE SESSIONS. A collection of conference specific tweets can be found #rstudioconf hashtag.
To assist with the sorting and filtering of the numerous tweets, Garrick Aden-Buie, who wasn’t actually able to attend rstudio::conf2018, created a shiny app to assist with sorting, filtering, and searching capabilities and is applicable to any twitter dataset.
To get a sense of the twitter madness, you could explore the rstudio::conf2018 tweets yourself using Mike Kearney’s rtweet package. Mike’s vignette includes some awesome analysis and visualizations of twitter stats from rstudio::conf2018.
I don’t want to get too distracted by the twitter analyses. It would be fun topic for another meetup. I wanted to give an overview of a few of my favorite talks, and some general themes that I took away from the conference.
tidyverse and beyond: Challenges for the Future in Data ScienceA major take away from the keynote was the evolution of plots to be used for inference. Could you imaging making inferential decisions based on the visualization of the data?
“Now we can start doing statistics with plots, actually statistical inference”
It’s a good old fashioned police line-up!! Make your visualization, then scramble the data and make multiple plots. If someone not familiar with the data can’t pick actual data vs scrambled for most striking, maybe there’s no important relation!
Is there a package for that…you bet!
Nullabor lets you create a lineup of your data visualizations without looking at the data first. The example above is creating 12 random plots (one of which is the real data).
Last note, regarding the importance of visualization,
What her protégé had to say about it…“You can get statistical significance really cheaply these days” due to large sample sizes (and ability to work with them) – Di Cook
— @hadleywickham
tidyverse to beginnersNote. the tidyverse is a set of cohesive R packages.
The key to teaching R…
“Have goals for what you want your students to do, and start them doing it as early as possible.” David Robison
Why tidyverse?
It’s intuitive
It’s powerful
Students can quickly explore their data and create visualizations
It’s overall language is consistant across functions within packages
Traditional learning curve for R, when instruction began with data structures and programming statements.
The tidyverse can closely approximate the blue dashed learning curve…actually inversing the historical learning curve of R.
tidyverse first).
tidyverseThe following set of functions within the tidyverse (notation package::function) were discussed.
A quick reminder, documentation for functions and packages can be googled and/or displayed in the RStudio help window by typing ?package::function() or ?function()
Listed below are a few details for three functions.
skimr will provide a concise set of descriptive statistics (and for continuous variables histogram plots) for your entire dataset. Listed below is a sample output and a nice caricature of the joy it can induce.fct_relevel allows you to specify the order of an ordinal variable (variable with response options on a continuum, for example “less” to “more” or “high” to “low”). As a default, R will order a character string alphabetically. In the example below, fct_relevel has been set to reorder the variable by frequency.ggplot(aes(response)) + geom_bar()
Solution: fct_relevel() to manually order the factor
ggplot(aes(x = fct_relevel(response, "Rarely", "Sometimes", "Often", "Most of the time"))) + geom_bar()
Interactive graphics augment exploration and allows one to search information quickly without fully specified questions (Unwin & Hoffmann, 2000). Multiple linked views are the optimal framework for posing queries about data (Buja, Cook, & Swayne 1996)
With crosstalk you can create standalone HTML output that does not require a web server!!
crosstalk examples:
R and/or supporting the use of R as an organizationThe Mayo Clinic switched to R after a SAS licensing increase of 10x!!
Mayo’s story….
How to increase #rstats adoption from SAS? Create packages with SAS macros, use Rstudio, and support training for new users! Make R fun! Increase motivation with the awesome things R can do: Rmarkdown and Shiny for example.
Pinnacle is…
Training an Army of Data Scientist at Pinnacle using Datacamp
R mastery based on work requirements assigning specific Datacamp courses
Lessons Learned
R - Sandy GriffithRecap by Sharla Gelfand
Flatiron choose R over SAS for her team, the steps that went into that choice, then cultivating and sustaining a strong R team. Cultivating started with a lot of support – an internal R package, user group, Slack channels, training, and hiring. She then focused on sustaining – once everyone is proficient, there’s a need to focus on consistency and contribution, via growing internal packages, and focusing on reproducibility, quality control, and standardization.
She acknowledged that there are challenges now – devoting time to infrastructure, internal package management, and coordinating R usage outside of the Quantitative Sciences team among them, all areas of potential improvement for a company that is quite mature in R.
Sandy is biking through the desert post conf, but I will link her slides (and add more detail on the talk if they jog my memory – I can tell I’m forgetting a lot) once/if they’re available!
— Post of talk by @jent103
tidycf: Turning analysis on its head by turning cash flows on their sidesAt Capital One the business analytics has been done in different tools (spreadsheets, PowerPoint, word processor, databases, BI tools, etc.) with poor documentation and reproducibility.
After noticing nuanced business decisions are driven by a remarkably standard analytical tools and techniques. Capital One created an internal tidy cash flow (tidycf) package to train new users, improve efficiency, reproducible and reduce error. After consolidating and reorganizing the data at Capital One a single R package was developed to run statistical analyses alongside cash flow analyses (with everyone using the same dataset!!)
tidycf) to match user/organizational needs:
R with the templatesGoal was to get engagement from the community and once comfortable convert the community from users to developers
Mission = Success!! Capital One has increased the adoption of R and the creation of new tools across the company.
R Administrator might be the next budding position in the data science workforceWorthy note on “change management” and “transition management”
@earino on bringing people from SAS or excel to #rstats: “change management is about whether new tools are available. But there’s an identity issue too: people were an expert in their tool and now they are going to be a beginner. That’s transition management” #rstudioconf
A tour of the new features that became part of the v1.1 release of the RStudio IDE:
A Terminal tab, giving you access to a shell directly within the IDE,
An Object Explorer, for inspecting deeply-nested R objects,
A modern theme, including a dark IDE theme to accompany the dark editor themes,
A Connections tab, for managing connections to external SQL data stores,
Improvements to Git integration, making it easier to manage git branches from within the IDE, and
A few useful quality of life features
Color in the Console
Addins list is searchable
Commands are searchable using Ctrl + R with the cursor in the Console pane to search that list
*In RStudio Connect, you can create one report with a few input options (parameters) to filter/sort the underlying data creating a dynamic report without changing the actual report!!
RStudio Connect interface will send reports on a schedule and keep a history of reports generated (all other Connect features are also available)
https://rviews.rstudio.com/2017/06/21/analytics-administration-for-r/
R Admin is position and set of tasks that have till now fell on the shoulders of a willing and IT lingual R user.
R admin roll is to work with IT to:
R as the analytic standardR toolingR into other systemsI had no idea stickers were so interesting or important…and I may have let you down. Checkout the sticker hype below. Next time, I’ll make an effort to bring back stickers for everyone!!
— @JoeDot2
— @tpsteiner
— @shermstats
— @CivicAngela
— @ellisvalentiner
This is just a brief recap of a few talks. Not discussed due to time were a keynote on utilizing TensorFlow for deep learning algorithms and additional talks on unpacking vectors, data “rectangling, debugging techniques, drill down reporting with shiny, modeling in the tidyverse and working with firewalls…all were worthy candidates for discussion and I’m sure the same is true for the talks I was unable to attend. I hope this brief but spectacular presentation introduced you to a set of new adventures.
R has moved well beyond academia.
I met an employee of the British Columbia government and learned they use R, RStudio, and GitHub to support free and open source projects. The provide public access to all of their work including their bcgov GitHub organization and an R package bcmaps that contains the spatial map layers for the province.
Registration is open for rstudio::conf2019 (1/15-1/19) in Austin, Tx.